logo

SEA Shark and Ray Research and Conservation Workshop


Introductions

Who am I?


Vinay is a Research Scientist at the Australian Institute of Marine Science. He is an ecologist that is particularly interested in using spatio-temporal datasets to understand animal movements and distributions patterns. He has considerable experience using R to analyse and visualise large and complex spatial datasets. He has developed R code and packages to analyse 2 and 3 dimensional movement patterns of animals using acoustic telemetry data from single study sites to continental scale arrays. Vinay’s R codes can be found on his github page.




Course outline

In this course you will learn about different ways to analyse and interpret your aquatic telemetry datasets using R. This workshop will demonstrate how R can make the processing of spatial data much quicker and easier than using standard GIS software! At the end of this workshop you will also have the annotated R code that you can re-run at any time, share with collaborators and build on with those newly acquired data!

I designed this course not to comprehensively cover all the tools in R, but rather to give you an understanding of options on how to analyse your telemetry data (both from satellite and acoustic tags). Every new project comes with its own problems and questions and you will need to be independent, patient and creative to solve these challenges. It makes sense to invest time in becoming familiar with R, because today R is the leading platform for environmental data analysis and has some other functionalities which may surprise you!


This R workshop is intended to run across 3 sessions.


  • Session 1: Getting familiar with R and spatial data
  1. A brief introduction to R
  2. Import and explore datasets using %pipes% and the tidyverse group of R packages
  3. Working with Spatial objects using the sf and mapview R packages


  • Session 2: Working with satellite telemetry data
  1. Understanding the data structure from satellite tags
  2. Processing satellite tag data using the aniMotum package
  3. Visualising satellite tag data using the ggspatial package


  • Session 3: Working with passive acoustic telemetry data
  1. Understanding the data structure from acoustic telemetry data
  2. Using the VTrack R package to explore patterns in animal detections and dispersal
  3. Using the remora R package to interactively explore your telemetry data




Course Resources

The course resources will be emailed to you prior to the workshop. However, you can also access the data and scripts we will work through in this course, download the course resources from this GitHub repository page. This page contains the course documents, telemetry example data and R scripts we are going to work with. To download the folder click on the green Code, dropdown menu and select “Download ZIP”






Session 1

Getting familiar with R and spatial data



1.1 A brief Introduction to R

The process of turning raw telemetry data into publishable results is a highly involved. Tracking data sets are becoming larger, and larger as they are being gathered over longer time periods, over larger spatial extents and at increasing temporal resolutions. While this is increasing our ability to detect subtle patterns, these data sets are becoming vast and require analytical tools that easily handle, manipulate and visualise these complex datasets.

Processing and analysing telemetry datasets can require a huge investment in time: rearranging data, removing erroneous values, purchasing, downloading and learning the new software, and running analyses. Furthermore merging together Excel spreadsheets, filtering data and preparing data for statistical analyses and plotting in different software packages can introduce all sorts of errors.

R is a powerful language for data wrangling and analysis because…

  1. It is relatively fast to run and process commands
  2. You can create repeatable scripts
  3. You can trace errors back to their source
  4. You can share your scripts with other people
  5. It is easy to identify errors in large data sets
  6. Having your data in R opens up a huge array of cutting edge analysis tools.
  7. R is also totally FREE!

As R is open source, the more people we can get helping out on the R mailing lists (e.g. R-sig-geo) and contributing their own packages to the wider community, the more powerful R becomes!

For this course, we assume you have a basic understanding of the R environment, and working with RStudio


1.1.1 Installing packages

Part of the reason R has become so popular is the vast array of packages that are freely available and highly accessible. In the last few years, the number of packages has grown exponentially > 10,000 on CRAN! These can help you to do a galaxy of different things in R, including running complex analyses, drawing beautiful figures, running R as a GIS, constructing your own R packages, building web pages and even writing R course handbooks like this one!

Let’s suppose you want to load the sf package to access this package’s incredible spatial functionality. If this package is not already installed on your machine, you can download it from the web by using the following command in R.

install.packages("sf", repos='http://cran.us.r-project.org')

In this example, sf is the package to be downloaded and ‘http://cran.us.r-project.org’ is the repository where the package will be accessed from.

Multiple packages can be loaded at the same time by listing the required packages in a vector…

install.packages(c("tidyverse",
                   "sf",
                   "ggspatial",
                   "mapview"), repos='http://cran.us.r-project.org')


More recently, package developers have also used other platforms like GitHub to house R packages. This has enabled users to access packages that are actively being updated and enable developers to fix problems and develop new features with user feedback.

The remotes and devtools R packages have enabled the installation of packages directly from platforms like GitHub. For example, if we want want to download the VTrack package from the github repository, we can use the install_github() package to do it like this:

remotes::install_github("rossdwyer/VTrack")


Once installing the packages we want to use, we need to load them using the library() function.

library(tidyverse)
library(sf)
library(ggspatial)
library(mapview)
library(VTrack)



1.2. Import and explore datasets using %pipes% and the tidyverse group of R packages


In this session we are going to work with a data set containing detection data from 3 Australian Blacktip Sharks (Carcharhinus tilstoni) shown in the image above. These animals were captured and tagged within Cleveland Bay, Townsville roughly one month prior to the landfall of Cyclone Yasi in 2011. Blacktip sharks were tracked using a network of acoustic hydrophones deployed in a grid pattern on the East and West side of Cleveland Bay.

Telemetry data from these sharks were analysed alongside 45 others from five species to examine movement patterns of coastal sharks before, during and after three extreme weather events in Australia (Cyclone Yasi and Tropical Storm Anthony, 2011) and the US (Tropical Storm Gabrielle, 2001). You can read more about that study here.


The web map of detection data we will explore by the end of Session 1


Before we can analyse these data, we first need to read this dataset into R. As with most acoustic detection datasets exported from VUE or other acoustic telemetry data management software, our data set is in the ‘comma sperated value’ (.csv) format.

A .csv file can simply be imported into R using the read.csv base function, and by telling R which file to load (Blacktip_ClevelandBay.csv) and where to find it (i.e. in the ‘Data’ folder).

# Load the blacktip shark data using base read.csv function
blacktip <- read.csv('data/Session 1/Blacktip_ClevelandBay.csv', header = TRUE)


A note about Excel files

Don’t use ‘.xlsx’ or ‘.xls’ files for saving data. The problem with ‘.xls’ and ‘.xlsx’ files are that they store extra info with the data that makes files larger than necessary and Excel formats can also unwittingly reformat or alter your data!

A stable way to save your data is as a ‘.csv’ file. These are simply values separated by ‘commas’ and rows defined by ‘returns’. If you select ‘Save as’ in Excel, you can choose ‘.csv’ as one of the options. If you open the .csv file provided in the ‘Data’ folder using a text editor, you will see it is just words, numbers and commas.




What is the tidyverse?

The tidyverse is the collective name given to suite of R packages designed mostly by Hadley Wickham. This is becoming an increasingly popular set of packages that share an underlying design philosophy, grammar, and data structure. You can learn more about all the features of these packages from the free online course developed by the package creators here.


Members of the tidyverse

readr, broom, dplyr, forcats, ggplot2, haven, httr, hms, jsonlite, lubridate, magrittr, modelr, purrr, readr, readxl, stringr, tibble, rvest, tidyr, xml2

The advantage of the tidyverse is that most of these packages (but not all!) can be loaded simultaneously using a single line of code

library(tidyverse)


The tidyverse version of the above code will be read_csv() function. The main difference being the data imported as a tibble data frame. The advantage of a tibble database is that all the columns will be formatted correctly, with the package guessing what the best format may be.

blacktip <- read_csv('data/Session 1/Blacktip_ClevelandBay.csv')

# You can also use read_csv to input data directly from a website URL
blacktip <- read_csv('https://raw.githubusercontent.com/vinayudyawer/SEA-workshop2023/main/data/Session%201/Blacktip_ClevelandBay.csv')

head(blacktip)
## # A tibble: 6 × 10
##   date_time           receiver   transmitter transmitter_name transmitter_serial
##   <dttm>              <chr>      <chr>       <chr>                         <dbl>
## 1 2011-01-23 16:41:31 VR2-5052   A69-1303-6… Ana                        11237964
## 2 2011-01-23 16:43:35 VR2-5052   A69-1303-6… Ana                        11237964
## 3 2011-01-23 16:48:25 VR2-5052   A69-1303-6… Ana                        11237964
## 4 2011-01-23 21:07:34 VR2W-1049… A69-1303-6… Ana                        11237964
## 5 2011-01-23 21:12:06 VR2W-1049… A69-1303-6… Ana                        11237964
## 6 2011-01-24 10:57:27 VR2W-1041… A69-1303-6… Ana                        11237964
## # ℹ 5 more variables: sensor_value <lgl>, sensor_unit <lgl>,
## #   station_name <chr>, latitude <dbl>, longitude <dbl>


Pipes %>%


Now that we’ve successfully loaded in our tracking dataset, lets start having a closer look at the data using pipes %>%

  • Originally from the magrittr package but has been imported to the tidyverse.
  • %>% is an infix operator. This means it takes two operands, left and right.
  • ‘Pipes’ the output of the last expression/function (left) forward to the first input of the next funciton (right).
# For example, to see what class our data is in, we could use this code...
class(blacktip)

# Alternatively in the tidyverse we could use this code...
blacktip %>% class()


Benefits of pipes %>%

  • Functions flow in natural order that tells story about data.
  • Code effects are easy to reason about by inserting View() or head() into pipe chain.
  • Common style makes it easy to understand collaborator (or your own) code.

We can have a quick look at the data by typing:

# Now insert functions into the pipe chain
blacktip %>% View()
blacktip %>% head() # first 6 rows by default
blacktip %>% tail(10) # specify we want to look at the last 10 rows

This functionality is particularly useful if the data is very large!

Note the (), as opposed to the [] we used for indexing. The () signify a function.

We can look at the data more closely using the nrow(), ncol(), length(), unique(), str() and summary() functions.

blacktip %>% nrow() # number of rows in the data frame
blacktip %>% ncol() # number of columns in the data frame
blacktip %>% str() # provides internal structure of an R object
blacktip %>% summary() # provides result summary of the data frame
# pipes can be used for single column within data frames
blacktip$transmitter_name <-
  blacktip$transmitter_name %>% as.factor()

# pipes are used to conduct multiple functions on the dataset in a certain order
blacktip %>% 
  subset(transmitter_name == "Colin") %>% # subset dataset to include only detections by 'Colin'
  nrow() # number of rows (i.e. detections) from 'Colin'

Pipes can also be used to pre-process our data before plotting them. Lets now use pipes to plot a simple barplot of the number of Colins detections at each reciever.

blacktip %>% 
  subset(transmitter_name == "Colin") %>% # subset dataset to include only detections by 'Colin'
  with(table(station_name)) %>% # create a table with the number of rows (i.e. detections) per receiver
  barplot(las = 2, xlab = "Receiver station", ylab = "Number of Detections") # barplot of number of Colin's detections recorded per receiver


dplyr

  • dplyr is the data wrangling workhorse of the tidyverse.
  • Provides functions, verbs, that can manipulate data into the shape you need for analysis.
  • Has many backends allowing dplyr code to work on data stored in SQL databases and big data clusters.
    • Works via translation to SQL. Keep an eye out for the SQL flavour in dplyr

Basic vocabulary

  • select() columns from a tibble
  • filter() to rows matching a certain condition
  • arrange() rows in order
  • mutate() a tibble by changing or adding rows
  • group_by() a variable
  • summarise() data over a group using a function

Check out this useful online cheatsheet for data wrangling.


select

We can use the select function in dplyr to choose the columns we want to include for our analyses and plotting

# Select the rows we are interested in
blacktip <- 
  blacktip %>% 
  select(date_time, latitude, longitude, receiver, station_name, transmitter_name, transmitter, sensor_value) %>% # columns we want to include
  select(-sensor_value) # the minus symbol denotes columns we want to drop

head(blacktip)
## # A tibble: 6 × 7
##   date_time           latitude longitude receiver  station_name transmitter_name
##   <dttm>                 <dbl>     <dbl> <chr>     <chr>        <fct>           
## 1 2011-01-23 16:41:31    -19.3      147. VR2-5052  E18          Ana             
## 2 2011-01-23 16:43:35    -19.3      147. VR2-5052  E18          Ana             
## 3 2011-01-23 16:48:25    -19.3      147. VR2-5052  E18          Ana             
## 4 2011-01-23 21:07:34    -19.3      147. VR2W-104… E2           Ana             
## 5 2011-01-23 21:12:06    -19.3      147. VR2W-104… E2           Ana             
## 6 2011-01-24 10:57:27    -19.3      147. VR2W-104… E1           Ana             
## # ℹ 1 more variable: transmitter <chr>


filter and arrange

We can use these functions to subset the data to rows matching logical conditions and then arrange according to particular attributes

blacktip %>%
  filter(transmitter_name == "Ana") %>%
  arrange(date_time) # arrange Ana's detections in chronological order

blacktip %>%
  filter(transmitter_name == "Bruce") %>%
  arrange(desc(date_time)) # arrange Bruce's detections in descending chronological order


group_by and summarise

Determine the total number of detections for each tagged shark

blacktip %>%
  group_by(transmitter_name) %>%
  summarise(NumDetections = n()) # summarise number of detections per tagged shark

blacktip %>%
  group_by(transmitter_name, station_name) %>%
  summarise(NumDetections = n()) # summarise number of detections per shark at each receiver


**mutate

Adding and removing data to the data frame through a pipe

blacktip <- 
  blacktip %>%
  mutate(date = as.Date(date_time)) %>% # adding a column to the blacktip data with date of each detection
  mutate(transmitter = NULL) # removing the `Transmitter` column

head(blacktip)
## # A tibble: 6 × 7
##   date_time           latitude longitude receiver  station_name transmitter_name
##   <dttm>                 <dbl>     <dbl> <chr>     <chr>        <fct>           
## 1 2011-01-23 16:41:31    -19.3      147. VR2-5052  E18          Ana             
## 2 2011-01-23 16:43:35    -19.3      147. VR2-5052  E18          Ana             
## 3 2011-01-23 16:48:25    -19.3      147. VR2-5052  E18          Ana             
## 4 2011-01-23 21:07:34    -19.3      147. VR2W-104… E2           Ana             
## 5 2011-01-23 21:12:06    -19.3      147. VR2W-104… E2           Ana             
## 6 2011-01-24 10:57:27    -19.3      147. VR2W-104… E1           Ana             
## # ℹ 1 more variable: date <date>


**lubridate

  • lubridate is an easy way to convert date and time data into a form that R can recognise
  • Allows for calculation of durations and intervals between dates.
  • Recognises multiple date time formats and parses them to a standardised ‘POSIX’ format that R uses (ymd for dates; ymd_hms for date and time parsing)
  • These features are very important when working with spatio-temporal datasets like telemetry data

Currently in our blacktip dataset the “date_time” column is in the Universal Coordinated Time Zone (UTC). Let’s use lubridate to convert this column into the ‘POSIX’ format and into the local date time (i.e. UTC + 10 hours).

library(lubridate)

blacktip <-
  blacktip %>% 
  mutate(local_date_time = with_tz(date_time, tzone = "Australia/Brisbane")) %>% # convert to local "Australia/Brisbane" date time (UTC + 10hrs)
  mutate(date = date(local_date_time)) # use lubridate to update local date time into a date field


Data visualisation using ggplot2

ggplot2 is a powerful data visualization package for the R programming language. The package makes it very easy to generate some very impressive figures and utilise a range of colour palettes, taking care of many of the fiddly details that can make plotting graphs in R a hassle.

The system provides mappings from your data to aesthetics which are used to construct beautiful plots.

Documentation for ggplot2 can be found here.

There is also this awesome cheetsheet for ggplot2


ggplot2 grammar

The basic idea: independently specify plot building blocks and combine them to create just about any kind of graphical display you want.

Building blocks of a graph include:

  • data
  • aesthetic mapping
  • geometric object
  • statistical transformations
  • scales
  • coordinate system
  • position adjustments
  • faceting

Aesthetic Mapping

In ggplot2, aesthetic means “something you can see”. Aesthetic mapping (i.e., with aes()) only says that a variable should be mapped to an aesthetic. It doesn’t say how that should happen. For example, when mapping a variable to shape with aes(shape = x) you don’t say what shapes should be used. Similarly, aes(color = z) doesn’t say what colors should be used. Describing what colors/shapes/sizes etc. to use is done by modifying the corresponding scale.

In ggplot2 scales include:

  • position (i.e., on the x and y axes)
  • color (“outside” color)
  • fill (“inside” color)
  • shape (of points)
  • linetype
  • size

Each type of geom accepts only a subset of all aesthetics–refer to the geom help pages to see what mappings each geom accepts. Aesthetic mappings are set with the aes() function.


Geometic Objects (geom)

Geometric objects are the actual marks we put on a plot. Examples include:

  • points (geom_point, for scatter plots, dot plots, etc)
  • lines (geom_line, for time series, trend lines, etc)
  • boxplot (geom_boxplot, for, well, boxplots!) A plot must have at least one geom; there is no upper limit. You can add a geom to a plot using the + operator

You can get a list of available geometric objects using the code below:

help.search("geom_", package = "ggplot2")


In the below script we call the data set we have just made (blacktip) and then pipe it into the ggplot() function. We than tell ggplot that we want to plot a box plot.

library(ggplot2)   

blacktip %>%
  group_by(transmitter_name, date) %>% 
  summarise(daily_detections = n()) %>% # use summarise to calculate numbers of detections per day per animal
  ggplot(mapping = aes(x = transmitter_name, y = daily_detections)) + # define the aesthetic map (what to plot)
  xlab("Tag") + ylab("Number of detections per day") +
  geom_boxplot() # define the geometric object (how to plot it).. in this case a boxplot
## `summarise()` has grouped output by 'transmitter_name'. You can override using
## the `.groups` argument.

A common plot used in passive acoustic telemetry to assess temporal patterns in detection is the ‘abacus plot’. This plot can help quickly assess which animals are being detected consistently within your array, and identify any temporal or spatial patterns in detection frequency.

We can adapt the above script to create an abacus plot using our blacktip dataset.

blacktip %>%
  ggplot(mapping = aes(x = local_date_time, y = transmitter_name)) + 
  xlab("Date") + ylab("Tag") +
  geom_point()

Additional Task: Now that you’ve plotted the raw dates, can you figure out how to plot daily detections of our tagged sharks

We can also use the facet_wrap() function to explore the detection data further and look at how animals were detected at each reciever.

blacktip %>%
  ggplot(mapping = aes(x = local_date_time, y = station_name)) + 
  xlab("Date") + ylab("Receiver station") +
  geom_point() +
  facet_wrap(~transmitter_name, nrow=1) # This time plot seperate boxplots for each shark

Additional Task: Can you now plot this with a different colour for each shark?




1.3. Working with Spatial objects using the sf and mapview R packages







Session 2

Working with satellite telemetry data


2.1. Understanding the data structure from satellite tags


GPS data





ARGOS data





GLS data





2.2. Processing satellite tag data using the aniMotum package





2.3. Visualising satellite tag data using the ggspatial package






Session 3

Working with passive acoustic telemetry data

In this session we will go through a brief walk through of how we can use the VTrack R package to quickly format and analyse large acoustic tracking datasets. A lot of the functions here do similar analyses to the ones you learned in the previous session. We will then go through a new R package called remora that helps users to interactively explore thier data as well as append environmental data to detections to further your analysis of animal movements.

Here we are just arming you with multiple tools to be able to analyse your data. Which analysis (and thus R package) is more appropriate and suitable to your dataset will depend on your study design, research questions and data available. For this session, we will use the same data you worked on in session 2, however we will use the IMOS Workshop_Bull-shark-sample-dataset in the data folder you have downloaded.



3.1. Understanding the data structure from acoustic telemetry data





3.2. Using the VTrack R package to explore patterns in animal detections and dispersal





3.3. Using the remora R package to interactively explore your telemetry data







Signoff!

This is where we end our R workshop! There may have been a few bits of code that you had trouble with or need more time to work through. We encourage you to discuss these with us as well as others at the workshop to help get a handle on the R code.


If you have any comments or queries reguarding this workshop feel free to contact me:

Happy Tracking!